Skip to content

Conversation

@KRRT7
Copy link
Contributor

@KRRT7 KRRT7 commented May 18, 2025

User description

Benchmark 1: main
  Time (mean ± σ):      9.103 s ±  0.044 s    [User: 7.109 s, System: 1.990 s]
  Range (min … max):    9.046 s …  9.157 s    10 runs
 
Benchmark 2: optimizer
  Time (mean ± σ):      4.607 s ±  0.011 s    [User: 4.066 s, System: 0.532 s]
  Range (min … max):    4.594 s …  4.627 s    10 runs
 
Summary
  optimizer ran
    1.98 ± 0.01 times faster than main

where optimizer is this branch


PR Type

Enhancement, Tests


Description

  • Optimize AST return detection performance.

  • Refactor tracer internals with locking and caching.

  • Introduce comprehensive tracer unit tests.

  • Remove CI coverage upload and pytest-cov.


Changes walkthrough 📝

Relevant files
Enhancement
functions_to_optimize.py
Optimize return statement detection                                           

codeflash/discovery/functions_to_optimize.py

  • Implement DFS-based return detection for functions.
  • Early exit upon finding a Return node.
  • Replace ast.walk for improved performance.
  • +8/-1     
    tracer.py
    Refactor tracer locking, caching, commits                               

    codeflash/tracer.py

  • Introduce threading.Lock for DB access safety.
  • Move commits inside locked context blocks.
  • Add path_cache for resolved file paths.
  • Improve class detection with contextlib.suppress.
  • +137/-77
    Tests
    test_tracer.py
    Add comprehensive tracer tests                                                     

    tests/test_tracer.py

  • Add unit tests covering tracer behaviors.
  • Validate disable, timeout, filtering, threading.
  • Test simulate_call, runctx, exception handling.
  • +407/-0 
    Configuration changes
    unit-tests.yaml
    Remove CI Codecov upload step                                                       

    .github/workflows/unit-tests.yaml

  • Remove Codecov upload step from CI.
  • Simplify pytest command in workflow.
  • +1/-7     
    Dependencies
    pyproject.toml
    Remove pytest-cov dependency                                                         

    pyproject.toml

    • Remove pytest-cov dependency.
    +0/-1     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @github-actions
    Copy link

    github-actions bot commented May 18, 2025

    PR Reviewer Guide 🔍

    (Review updated until commit fc4f2de)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Return detection

    The custom DFS in function_has_return_statement may detect Return nodes in nested functions or comprehensions,
    leading to false positives. Ensure returns are only detected in the intended function scope.

    # Custom DFS, return True as soon as a Return node is found
    stack = [function_node]
    while stack:
        node = stack.pop()
        if isinstance(node, ast.Return):
            return True
        stack.extend(ast.iter_child_nodes(node))
    return False
    Thread safety

    The shared path_cache dictionary is modified without synchronization in tracer_logic,
    which could lead to race conditions when tracing concurrently in multiple threads.

    co_filename = code.co_filename
    if co_filename in self.path_cache:
        file_name = self.path_cache[co_filename]
    else:
        file_name = Path(co_filename).resolve()
        self.path_cache[co_filename] = file_name
    Timeout condition

    The check if None is not self.timeout is unconventional and may be confusing.
    Consider using if self.timeout is not None for clarity and consistency.

    if None is not self.timeout and (time.time() - self.start_time) > self.timeout:

    @github-actions
    Copy link

    github-actions bot commented May 18, 2025

    PR Code Suggestions ✨

    Latest suggestions up to fc4f2de
    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    General
    Move pickling outside lock

    The pickling and dill operations are expensive and currently held inside the
    database lock, blocking other threads.
    Move the serialization logic before acquiring self._db_lock, and keep the lock only
    around cursor execution and commit.
    This reduces contention and improves threading throughput.

    codeflash/tracer.py [362-414]

    +# Serialize arguments before locking
    +original_recursion_limit = sys.getrecursionlimit()
    +sys.setrecursionlimit(10000)
    +arguments_copy = dict(arguments.items())
    +if class_name and code.co_name == "__init__" and "self" in arguments_copy:
    +    del arguments_copy["self"]
    +try:
    +    local_vars = pickle.dumps(arguments_copy, protocol=pickle.HIGHEST_PROTOCOL)
    +except (TypeError, pickle.PicklingError, AttributeError, RecursionError, OSError):
    +    try:
    +        local_vars = dill.dumps(arguments_copy, protocol=dill.HIGHEST_PROTOCOL)
    +    except (TypeError, dill.PicklingError, AttributeError, RecursionError, OSError):
    +        self.function_count[function_qualified_name] -= 1
    +        sys.setrecursionlimit(original_recursion_limit)
    +        return
    +sys.setrecursionlimit(original_recursion_limit)
    +
     with self._db_lock:
    -    # Check connection again inside lock, in case __exit__ closed it.
         if self.con is None:
             return
    -
         cur = self.con.cursor()
    -
    -    t_ns = time.perf_counter_ns()
    -    original_recursion_limit = sys.getrecursionlimit()
    -    try:
    -        # pickling can be a recursive operator, so we need to increase the recursion limit
    -        sys.setrecursionlimit(10000)
    -        # We do not pickle self for __init__ to avoid recursion errors, and instead instantiate its class
    -        # directly with the rest of the arguments in the replay tests. We copy the arguments to avoid memory
    -        # leaks, bad references or side effects when unpickling.
    -        arguments_copy = dict(arguments.items())  # Use the local 'arguments' from frame.f_locals
    -        if class_name and code.co_name == "__init__" and "self" in arguments_copy:
    -            del arguments_copy["self"]
    -        local_vars = pickle.dumps(arguments_copy, protocol=pickle.HIGHEST_PROTOCOL)
    -        sys.setrecursionlimit(original_recursion_limit)
    -    except (TypeError, pickle.PicklingError, AttributeError, RecursionError, OSError):
    -        # we retry with dill if pickle fails. It's slower but more comprehensive
    -        try:
    -            sys.setrecursionlimit(10000)  # Ensure limit is high for dill too
    -            # arguments_copy should be used here as well if defined above
    -            local_vars = dill.dumps(
    -                arguments_copy if "arguments_copy" in locals() else dict(arguments.items()),
    -                protocol=dill.HIGHEST_PROTOCOL,
    -            )
    -            sys.setrecursionlimit(original_recursion_limit)
    -        except (TypeError, dill.PicklingError, AttributeError, RecursionError, OSError):
    -            self.function_count[function_qualified_name] -= 1
    -            return
         cur.execute(
             "INSERT INTO function_calls VALUES(?, ?, ?, ?, ?, ?, ?, ?)",
             (
                 event,
                 code.co_name,
                 class_name,
                 str(file_name),
                 frame.f_lineno,
                 frame.f_back.__hash__(),
    -            t_ns,
    +            time.perf_counter_ns(),
                 local_vars,
             ),
         )
         self.trace_count += 1
         self.next_insert -= 1
         if self.next_insert == 0:
             self.next_insert = 1000
             self.con.commit()
    Suggestion importance[1-10]: 6

    __

    Why: Moving expensive serialization outside the self._db_lock reduces lock hold time and improves multithreaded throughput without changing behavior.

    Low
    Skip nested defs in return check

    The current DFS will count return statements inside nested functions or
    comprehensions as belonging to the outer function.
    To only detect returns in the top-level body, skip descending into inner function or
    class definitions.

    codeflash/discovery/functions_to_optimize.py [603-609]

    -stack = [function_node]
    +stack = list(function_node.body)
     while stack:
         node = stack.pop()
         if isinstance(node, ast.Return):
             return True
    +    # skip nested function/class definitions
    +    if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
    +        continue
         stack.extend(ast.iter_child_nodes(node))
     return False
    Suggestion importance[1-10]: 6

    __

    Why: Preventing descent into inner FunctionDef, AsyncFunctionDef, or ClassDef ensures only the top‐level function body’s return statements are detected, improving correctness.

    Low
    Reduce lock scope in __exit__

    The console output and stats computation are performed inside the database lock,
    unnecessarily extending the locked section.
    Move console.rule, self.create_stats, self.make_pstats_compatible, and
    self.print_stats outside of the lock,
    so only the actual DB operations remain protected.

    codeflash/tracer.py [192-235]

    +# end of tracing: commit pending tracer_logic
    +self.con.commit()
    +
    +# perform stats and printing outside lock
    +console.rule("Codeflash: Traced Program Output End", style="bold blue")
    +self.create_stats()
    +self.make_pstats_compatible()
    +self.print_stats("tottime")
    +
     with self._db_lock:
         if self.con is None:
             return
    -
    -    self.con.commit()  # Commit any pending from tracer_logic
    -    console.rule("Codeflash: Traced Program Output End", style="bold blue")
    -    self.create_stats()  # This calls snapshot_stats which uses self.timings
    -
         cur = self.con.cursor()
         cur.execute(
             "CREATE TABLE pstats (filename TEXT, line_number INTEGER, function TEXT, class_name TEXT, "
             "call_count_nonrecursive INTEGER, num_callers INTEGER, total_time_ns INTEGER, "
             "cumulative_time_ns INTEGER, callers BLOB)"
         )
         for func, (cc, nc, tt, ct, callers) in self.stats.items():
             remapped_callers = [{"key": k, "value": v} for k, v in callers.items()]
             cur.execute(
                 "INSERT INTO pstats VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
                 (
                     str(Path(func[0]).resolve()),
                     func[1],
                     func[2],
                     func[3],
                     cc,
                     nc,
                     tt,
                     ct,
                     json.dumps(remapped_callers),
                 ),
             )
         self.con.commit()
    -
    -    self.make_pstats_compatible()  # Modifies self.stats and self.timings in-memory
    -    self.print_stats("tottime")  # Uses self.stats, prints to console
    -
    -    cur = self.con.cursor()  # New cursor
         cur.execute("CREATE TABLE total_time (time_ns INTEGER)")
         cur.execute("INSERT INTO total_time VALUES (?)", (self.total_tt,))
         self.con.commit()
         self.con.close()
    -    self.con = None  # Mark connection as closed
    +    self.con = None
    Suggestion importance[1-10]: 5

    __

    Why: Pulling console.rule, stats generation, and printing outside the DB lock minimizes unnecessary blocking, keeping only the critical database operations protected.

    Low
    Use idiomatic None check

    The comparison None is not self.timeout is unconventional and can be confusing to
    readers.
    Revert it to the more idiomatic self.timeout is not None for clarity.

    codeflash/tracer.py [275-279]

    -if None is not self.timeout and (time.time() - self.start_time) > self.timeout:
    +if self.timeout is not None and (time.time() - self.start_time) > self.timeout:
         sys.setprofile(None)
         threading.setprofile(None)
         console.print(f"Codeflash: Timeout reached! Stopping tracing at {self.timeout} seconds.")
         return
    Suggestion importance[1-10]: 3

    __

    Why: Changing to the more conventional self.timeout is not None improves readability without affecting functionality.

    Low

    Previous suggestions

    Suggestions up to commit 60af8b7
    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Release DB lock before pickling

    Avoid holding the database lock during expensive serialization; move the pickling of
    arguments_copy outside of the with self._db_lock: block and only acquire the lock
    for the cursor execution and commit.

    codeflash/tracer.py [362-378]

    +t_ns = time.perf_counter_ns()
    +...
    +local_vars = pickle.dumps(arguments_copy, protocol=pickle.HIGHEST_PROTOCOL)
     with self._db_lock:
    +    if self.con is None:
    +        return
         cur = self.con.cursor()
    -    t_ns = time.perf_counter_ns()
    -    ...
    -    local_vars = pickle.dumps(arguments_copy, protocol=pickle.HIGHEST_PROTOCOL)
    -    ...
         cur.execute(
             "INSERT INTO function_calls VALUES(?, ?, ?, ?, ?, ?, ?, ?)",
             (..., local_vars),
         )
    +    if self.next_insert == 0:
    +        self.con.commit()
    Suggestion importance[1-10]: 7

    __

    Why: Moving expensive serialization outside the DB lock reduces contention and improves concurrency without altering logic.

    Medium
    General
    Add IF NOT EXISTS to pstats

    Prevent errors when the table already exists by using IF NOT EXISTS in your CREATE
    TABLE statements.

    codeflash/tracer.py [201-204]

     cur.execute(
    -    "CREATE TABLE pstats (filename TEXT, line_number INTEGER, function TEXT, class_name TEXT, "
    +    "CREATE TABLE IF NOT EXISTS pstats (filename TEXT, line_number INTEGER, function TEXT, class_name TEXT, "
         "call_count_nonrecursive INTEGER, num_callers INTEGER, total_time_ns INTEGER, "
         "cumulative_time_ns INTEGER, callers BLOB)"
     )
    Suggestion importance[1-10]: 5

    __

    Why: Addition of IF NOT EXISTS to the pstats table creation prevents errors on reruns without changing functionality.

    Low
    Add IF NOT EXISTS to total_time

    Use IF NOT EXISTS to avoid failures if the total_time table is already present.

    codeflash/tracer.py [231]

    -cur.execute("CREATE TABLE total_time (time_ns INTEGER)")
    +cur.execute("CREATE TABLE IF NOT EXISTS total_time (time_ns INTEGER)")
    Suggestion importance[1-10]: 5

    __

    Why: Guarding the total_time table creation with IF NOT EXISTS avoids table-already-exists errors on repeated tracer runs.

    Low
    Use proper None check

    Use the conventional is not None check to improve readability and prevent potential
    confusion.

    codeflash/tracer.py [275]

    -if None is not self.timeout and (time.time() - self.start_time) > self.timeout:
    +if self.timeout is not None and (time.time() - self.start_time) > self.timeout:
    Suggestion importance[1-10]: 3

    __

    Why: Changing None is not self.timeout to the more idiomatic self.timeout is not None improves readability with no behavior change.

    Low

    @KRRT7 KRRT7 force-pushed the tracer-optimization branch from 165b5fb to ff3222b Compare May 23, 2025 02:10
    @openhands-ai
    Copy link

    openhands-ai bot commented May 29, 2025

    Looks like there are a few issues preventing this PR from being merged!

    • GitHub Actions are failing:
      • Lint

    If you'd like me to help, just leave a comment, like

    @OpenHands please fix the failing actions on PR #215

    Feel free to include any additional details that might help me get this PR into a better state.

    You can manage your notification settings

    @KRRT7 KRRT7 force-pushed the tracer-optimization branch from 291b3f9 to ee4c7ad Compare May 30, 2025 05:02
    codeflash-ai bot added a commit that referenced this pull request May 30, 2025
    …`tracer-optimization`)
    
    Here is your optimized code. The optimization targets the **`trace_dispatch_return`** function specifically, which you profiled. The key performance wins are.
    
    - **Eliminate redundant lookups**: When repeatedly accessing `self.cur` and `self.cur[-2]`, assign them to local variables to avoid repeated list lookups and attribute dereferencing.
    - **Rearrange logic**: Move cheapest, earliest returns to the top so unnecessary code isn't executed.
    - **Localize attribute/cache lookups**: Assign `self.timings` to a local variable.
    - **Inline and combine conditions**: Combine checks to avoid unnecessary attribute lookups or `hasattr()` calls.
    - **Inline dictionary increments**: Use `dict.get()` for fast set-or-increment semantics.
    
    No changes are made to the return value or side effects of the function.
    
    
    
    **Summary of improvements:**  
    - All repeated list and dict lookups changed to locals for faster access.
    - All guards and returns are now at the top and out of the main logic path.
    - Increments and dict assignments use `get` and one-liners.
    - Removed duplicate lookups of `self.cur`, `self.cur[-2]`, and `self.timings` for maximum speed.
    - Kept the function `trace_dispatch_return` identical in behavior and return value.
    
    **No other comments/code outside the optimized function have been changed.**
    
    ---
    
    **If this function is in a hot path, this will measurably reduce the call overhead in Python.**
    @codeflash-ai
    Copy link
    Contributor

    codeflash-ai bot commented May 30, 2025

    ⚡️ Codeflash found optimizations for this PR

    📄 25% (0.25x) speedup for Tracer.trace_dispatch_return in codeflash/tracer.py

    ⏱️ Runtime : 92.6 microseconds 74.4 microseconds (best of 80 runs)

    I created a new dependent PR with the suggested changes. Please review:

    If you approve, it will be merged into this PR (branch tracer-optimization).

    @KRRT7 KRRT7 force-pushed the tracer-optimization branch from ee4c7ad to a34c6aa Compare June 10, 2025 01:50
    @github-actions github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Jun 10, 2025
    codeflash-ai bot added a commit that referenced this pull request Jun 10, 2025
    …(`tracer-optimization`)
    
    Here is an optimized version of your program, focusing on the major bottleneck seen in the line profiler:  
    `stack.extend(ast.iter_child_nodes(node))` is taking **80%** of the total runtime.
    
    ### Main Optimization
    
    - **Avoid repeated iterator creation:**  
      `ast.iter_child_nodes` creates a generator each time, and extending with a generator is slower than extending with a list due to type checks and resizing on the list.  
      Changing this to `stack.extend(list(ast.iter_child_nodes(node)))` is often faster for small lists (due to C-optimized list logic).  
    - **Pre-memoize field lookups:**  
      `ast.iter_child_nodes` re-inspects _fields each time. Since you only traverse AST (no node mutation), accessing `_fields` directly and using them is faster.
    - **Better local variable usage:**  
      Move global lookups like `ast.Return` to locals for faster lookups.
    - **Use `is` for type checks when possible:**  
      Since `ast` node classes are not subclassed, `type(node) is ast.Return` is a micro-optimization.
    - **Micro-optimization:**  
      Replace `.extend()` with multiple `.append()` only if profiling (for very shallow trees), but since ASTs can be deep, bulk operation is preferred.
    
    ----
    *.
    
    
    
    ### If you want highest speed, *completely bypass ast.iter_child_nodes* as below.
    
    
    
    **This version skips all repeated lookups in ast.iter_child_nodes.**
    
    ----
    
    ### Summary of what changed.
    - Pre-bind `ast.Return` and `ast.iter_child_nodes` to locals.
    - Use `type(node) is ast.Return` instead of `isinstance`.
    - Use `list(...)` inside `.extend()` for batch insertion.
    - Optionally, custom child iteration for maximum speed (bypassing `iter_child_nodes`).
    
    ---
    
    **You can choose either of the two optimized versions above depending on your use case.**  
    If you want only a drop-in fix, use the first rewrite. If you want _maximum speed_, use the custom field walker.  
    All existing comments are preserved.
    @KRRT7 KRRT7 marked this pull request as ready for review June 10, 2025 20:17
    @github-actions
    Copy link

    Persistent review updated to latest commit fc4f2de

    @KRRT7 KRRT7 merged commit 36ce827 into main Jun 10, 2025
    18 checks passed
    @KRRT7 KRRT7 deleted the tracer-optimization branch June 10, 2025 20:41
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    Review effort 4/5 workflow-modified This PR modifies GitHub Actions workflows

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants